17 research outputs found
Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting
Keyword spotting (KWS) constitutes a major component of human-technology
interfaces. Maximizing the detection accuracy at a low false alarm (FA) rate,
while minimizing the footprint size, latency and complexity are the goals for
KWS. Towards achieving them, we study Convolutional Recurrent Neural Networks
(CRNNs). Inspired by large-scale state-of-the-art speech recognition systems,
we combine the strengths of convolutional layers and recurrent layers to
exploit local structure and long-range context. We analyze the effect of
architecture parameters, and propose training strategies to improve
performance. With only ~230k parameters, our CRNN model yields acceptably low
latency, and achieves 97.71% accuracy at 0.5 FA/hour for 5 dB signal-to-noise
ratio.Comment: Accepted to Interspeech 201
SlimPajama-DC: Understanding Data Combinations for LLM Training
This paper aims to understand the impacts of various data combinations (e.g.,
web text, wikipedia, github, books) on the training of large language models
using SlimPajama. SlimPajama is a rigorously deduplicated, multi-source
dataset, which has been refined and further deduplicated to 627B tokens from
the extensive 1.2T tokens RedPajama dataset contributed by Together. We've
termed our research as SlimPajama-DC, an empirical analysis designed to uncover
fundamental characteristics and best practices associated with employing
SlimPajama in the training of large language models. During our research with
SlimPajama, two pivotal observations emerged: (1) Global deduplication vs.
local deduplication. We analyze and discuss how global (across different
sources of datasets) and local (within the single source of dataset)
deduplications affect the performance of trained models. (2) Proportions of
high-quality/highly-deduplicated multi-source datasets in the combination. To
study this, we construct six configurations of SlimPajama dataset and train
individual ones using 1.3B Cerebras-GPT model with Alibi and SwiGLU. Our best
configuration outperforms the 1.3B model trained on RedPajama using the same
number of training tokens by a significant margin. All our 1.3B models are
trained on Cerebras 16 CS-2 cluster with a total of 80 PFLOP/s in bf16
mixed precision. We further extend our discoveries (such as increasing data
diversity is crucial after global deduplication) on a 7B model with large
batch-size training. Our models and the separate SlimPajama-DC datasets are
available at: https://huggingface.co/MBZUAI-LLM and
https://huggingface.co/datasets/cerebras/SlimPajama-627B.Comment: Technical report. Huggingface: https://huggingface.co/MBZUAI-LLM and
https://huggingface.co/datasets/cerebras/SlimPajama-627
Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models
We introduce Jais and Jais-chat, new state-of-the-art Arabic-centric
foundation and instruction-tuned open generative large language models (LLMs).
The models are based on the GPT-3 decoder-only architecture and are pretrained
on a mixture of Arabic and English texts, including source code in various
programming languages. With 13 billion parameters, they demonstrate better
knowledge and reasoning capabilities in Arabic than any existing open Arabic
and multilingual models by a sizable margin, based on extensive evaluation.
Moreover, the models are competitive in English compared to English-centric
open models of similar size, despite being trained on much less English data.
We provide a detailed description of the training, the tuning, the safety
alignment, and the evaluation of the models. We release two open versions of
the model -- the foundation Jais model, and an instruction-tuned Jais-chat
variant -- with the aim of promoting research on Arabic LLMs. Available at
https://huggingface.co/inception-mbzuai/jais-13b-chatComment: Arabic-centric, foundation model, large-language model, LLM,
generative model, instruction-tuned, Jais, Jais-cha
Reducing the SPEC2006 Benchmark Suite for SimulationBased Abstract Computer Architecture Research
Present day computer architects use advanced microarchitecture simulators to test the performance of processor designs. The simulator workloads are generally benchmarks, which are representative of specific types of real world applications. Because microarchitecture implementations increase in complexity and the simulation workloads are required to represent complicated applications, the simulation time has greatly increased. To solve the problem, researchers are looking into ways to reduce the amount of time benchmarks run, while maintaining the same workload characterization of the longer benchmarks. MinneSPEC is a representative reduction of SPEC2000, with the reduced input sets found using SimpleScalar profiling tools [1]. With the release of SPEC CPU2006, new benchmarks have been added to the SPEC benchmarking suite which will be used to evaluate performance in tomorrow's microprocessors. These benchmarks are considerably larger than SPEC2000 and using SimpleScalar to profile their workloads would take a large amount of time and effort. This paper suggests a different reduction technique which gathers profiling information using processor performance counters accessed using PAPI. Since workloads are running on a native system instead of a simulator, profiling information can be gathered in a much shorter amount of time. This allows for fine grained tuning of reduced input sets so more representative reduced benchmarks can be found in a much shorter amount of time. Using this technique, we were able to reduce five SPEC2006 benchmarks to under 1